Sketch \star-metric: Comparing Data Streams via Sketching

نویسندگان

  • Emmanuelle Anceaume
  • Yann Busnel
چکیده

In this paper, we consider the problem of estimating the distance between any two large data streams in smallspace constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the Sketch ⋆-metric, which allows to define a distance between updatable summaries (or sketches) of large data streams. An important feature of the Sketch ⋆-metric is that, given a measure on the entire initial data streams, the Sketch ⋆-metric preserves the axioms of the latter measure on the sketch (such as the non-negativity, the identity, the symmetry, the triangle inequality but also specific properties of the f -divergence). Extensive experiments conducted on both synthetic traces and real data allow us to validate the robustness and accuracy of the Sketch ⋆-metric.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sketch ?-metric: Comparing Data Streams via Sketching RESEARCH REPORT

In this paper, we consider the problem of estimating the distance between any two large data streams in smallspace constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the S...

متن کامل

Comparing Data Streams via Sketching

We consider the problem of estimating the distance between any two large data streams in smallspace constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the Sketch ⋆-metric,...

متن کامل

Corrections to “LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams”

In this article, we describe the corrections to our paper “LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams” published at IEEE INFOCOM 2014. We also clarify the complexity issue raised by some readers. 1 Corrections to Lemmas and Theorems

متن کامل

Improved Sketching of Hamming Distance with Error Correcting

We address the problem of sketching the hamming distance of data streams. We present a new notion of sketching technique, Fixable sketches and we show that using such sketch not only we reduce the sketch size, but also restore the differences between the streams. Our contribution: For two streams with hamming distance bounded by k we show a sketch of size O(k logn) with O(logn) processing time ...

متن کامل

Sketch-Based Multi-query Processing over Data Streams

Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1207.6465  شماره 

صفحات  -

تاریخ انتشار 2012